Extended Performance Graphs for Cluster Retrieval

نویسندگان

Dionysius P. Huijsmans

Nicu Sebe

چکیده

Performance evaluations in Probabilistic Information Retrieval are often presented as Precision-Recall or PrecisionScope graphs avoiding the otherwise dominating effect of the embedding irrelevant fraction. However, precision and recall values as such offer an incomplete overview of the information retrieval system under study: information about system parameters like generality (the embedding of the relevant fraction), random performance and the effect of varying the scope is badly missed. In this paper three cluster performance graphs are presented. In those cases where complete ground truth is available (both cluster size and database size) the Cluster Precision-Recall (Cluster PR) graph and the GeneralityPrecision=Recall graph are proposed. In those cases where cluster sizes are unknown (and so recall) the double logarithmic Cluster Precision Window graph is proposed. 1 Shortcomings of presently used retrieval performance measures Performance characterization of content-based image and audio retrieval often borrows from performance figures developed over the past 30 years for probabilistic text retrieval. Landmarks in the text retrieval field are the books [12] and [11] as well as the proceedings of the annual ACM SIGIR [7] and NIST TREC [14] conferences. In the area of probabilistic retrieval the results of performance measurements are often presented in the form of Precision-Recall (or Recall-Precision) graphs and PrecisionScope graphs. Each of these standard performance graphs provides the user with incomplete information about how the IR System will perform for various cluster sizes and various embedding sizes. Generality (influence of the relevant fraction) as a system parameter hardly seems to play a role in performance analysis. Although generality may be left out as a performance indicator when competing methods are tested under constant generality conditions, it appears to be neglected even in cases where generality is widely varying (a wide range of cluster sizes in one specific database is the most frequently encountered example). That generality for a cluster of relevant items in a large embedding database is often ≈ 0.0 does not mean that its exact low level no longer matters. A continually growing embedding around a constant size cluster of relevant items will eventually lower the overall precision-result curve (for the user) to unacceptable low levels as is shown in Figure 1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extended Performance Graphs for Cluster Retrieval

متن کامل

Sparse Clustering for Probability Un-Weighted Graphs Mining

-Probabilistic graphs have significant importance in data mining. The correlations endure amid the adjacent edges in different probabilistic graphs. Graph clustering is used in exploratory data analysis at data compression, information retrieval and image segmentation. The existing work presented a Partially Expected Edit Distance Reduction (PEEDR) and Correlated Probabilistic Graphs Spectral (...

متن کامل

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

ar X iv : 0 80 4 . 35 99 v 1 [ cs . I R ] 2 2 A pr 2 00 8 Respect My Authority ! HITS Without Hyperlinks , Utilizing Cluster - Based Language Models

We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform re-ranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing entities. Links between entities are created via c...

متن کامل

Artificial Intelligence and Query Execution Methods in the VizIR Framework

The article introduces the architecture of the querying components of the visual information retrieval framework VizIR. A major design goal was to assure adaptability and extensibility in manifold ways. VizIR components can be arbitrarily combined to build extensive applications. The framework provides various visual content descriptors, similarity measures and query models. Moreover, the platf...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Extended Performance Graphs for Cluster Retrieval

نویسندگان

چکیده

منابع مشابه

Extended Performance Graphs for Cluster Retrieval

Sparse Clustering for Probability Un-Weighted Graphs Mining

An Effective Path-aware Approach for Keyword Search over Data Graphs

ar X iv : 0 80 4 . 35 99 v 1 [ cs . I R ] 2 2 A pr 2 00 8 Respect My Authority ! HITS Without Hyperlinks , Utilizing Cluster - Based Language Models

Artificial Intelligence and Query Execution Methods in the VizIR Framework

عنوان ژورنال:

اشتراک گذاری